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Abstract 

This paper presents an unsupervised algorithm for nonlinear unmixing of hyperspectral images. 
The proposed model assumes that the pixel reflectances result from a nonlinear function of the 
abundance vectors associated with the pure spectral components. We assume that the spectral 
signatures of the pure components and the nonlinear function are unknown. The first step of the 
proposed method consists of the Bayesian estimation of the abundance vectors for all the image 
pixels and the nonlinear function relating the abundance vectors to the observations. The endmembers 
are subsequently estimated using Gaussian process regression. The performance of the unmixing 
strategy is evaluated with simulations conducted on synthetic and real data. 

Index Terms 

Hyperspectral images, nonlinear spectral unmixing, unsupervised unmixing, Gaussian process 
regression, Bayesian estimation. 

I. Introduction 

Spectral unmixing (SU) is a major issue when analyzing hyperspectral images. It consists 
of identifying the macroscopic materials present in an hyperspectral image and quantifying 
the proportions of these materials in the image pixels. Many SU strategies assume that 
pixel reflectances are linear combinations of pure component spectra [1]. The resulting linear 
mixing model (LMM) has been widely adopted in the literature and has provided some 
interesting results. However, as discussed in (TJ, the LMM can be inappropriate for some 
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hyperspectral images, such as those containing sand, trees or vegetation areas. Nonlinear 
mixing models provide an interesting alternative to overcome the inherent limitations of the 
LMM. Nonlinear mixing models recently proposed in the literature include the bidirectional 
reflectance-based model of j2j for hyperspectral images including intimate mixtures, i.e., 
when the photons are intereacting with all the materials simultaneously. Such mixtures may 
occur for instance in sand or mineral areas. Another class of nonlinear model referred to as 
bilinear models have been studied in [[3j-[[6j for modeling scattering effects (mainly observed 
in vegetation areas). Other more flexible unmixing techniques have been also proposed to 
handle wider classes of nonlinearity, including radial basis function networks [J7J, [[8j post- 



nonlinear mixing models [9] and kernel-based models [10|-|13|. 

Most existing unmixing strategies can be decomposed into two steps referred to as end- 
member extraction and abundance estimation. Endmember identification is usually achieved 
before estimating the abundances for all the image pixels. In the last decade, many endmember 
extraction algorithms (EEAs) have been developed to identify the pure spectral components 
contained in a hyperspectral image (see [ 1 ] for a recent review of these methods). Most EEAs 
rely on the LMM, which, as discussed, is inappropriate for the case of nonlinear mixtures 
of the endmembers. More recently, an EEA was proposed in [14] to extract endmembers 
from a set of nonlinearly mixed pixels. This paper proposes first to estimate the abundance 
vectors and to estimate the endmembers during a second step, using the prediction capacity 
of Gaussian processes (GPs). This approach breaks from the usual paradigm of spectral 
unmixing. More precisely, this paper considers a kernel-based approach for nonlinear SU 
based on a nonlinear dimensionality reduction using a Gaussian process latent variable model 
(GPLVM). The main advantage of GPLVMs is their capacity to accurately model many 
different nonlinearities. In this paper, we propose to use a particular form of kernel based 
on existing bilinear models, which allows the proposed unmixing strategy to be accurate 
when the underlying mixing model is bilinear. Note that the LMM is a particular bilinear 
model. The algorithm proposed herein is "unsupervised" in the sense that the endmembers 
contained in the image and the mixing model are not known. Only the number of endmembers 
is assumed to be known. As a consequence, the parameters to be estimated are the kernel 
parameters, the endmember spectra and the abundances for all image pixels. 

The paper is organized as follows. Section |n] presents the nonlinear mixing model consid- 



ered in this paper for hyperspectral image unmixing. Section III introduces the GPLVM used 
for latent variable estimation. The constrained GPLVM for abundance estimation is detailed 
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in Section [TV] Section IV] studies the endmember estimation procedure using GP regression. 



Some simulation results conducted on synthetic data are shown and discussed in Section VI 



Finally, conclusions are drawn in Section VII 



II. Nonlinear mixing model 

Consider a hyperspectral image of N pixels, composed of R endmembers and observed in 
L spectral bands. For convenience, the data are assumed to have been previously centered. The 
L-spectrum y(n) = [yi(n), . . . , yL{n)} T of the nth mixed pixel (n = 1, . . . , N) is defined as a 
transformation of its corresponding abundance vector a{n) = [ai(n), . . . , a^(n)] T as follows 

y(n) = g[a(n)]+e(n), n=l,...,N (1) 

where g : 1H R — > IR L is a linear or nonlinear unknown function. The noise vector e(n) is 
an independent, identically distributed (i.i.d.) white Gaussian noise sequence with variance 
a 2 , i.e., e(n) ~ N (e(n)\0 L , o~ 2 Il) , n = 1,...,N. Without loss generality, the nonlinear 
mapping ([TJ) from the abundance space to the observation space can be rewritten 

y(n) = W(W[a(n)]+e(n), n = l,...,N (2) 

where ip : IR R — > IR D , Wo is an L x D matrix and the dimension D is the dimension of the 
subspace spanned by the transformed abundance vectors ?/> [a(n)] , n — 1, . . . , N. Of course, 
the performance of the unmixing strategy relies on the choice of the nonlinear function ip. 
In this paper, we will use the following nonlinearity 

a ^^[a] = [ai,...,a R ,aia 2 ...,a R - 1 a R } T , (3) 

with D = R(R + l)/2. The primary motivation for considering this particular kind of 
nonlinearity is the fact that the resulting mixing model is a bilinear model with respect to 
each abundance a r , r — 1, . . . , R. More precisely, this mixing model extends the generalized 
bilinear model proposed in (6j and thus the LMM. It is important to note from ([2]) and Q 
that V^o contains the R spectra of the pure components present in the image and R(R — l)/2 
interaction spectra between these components. Note also that the analysis presented in this 
paper could be applied to any other nonlinearity -0. 

Due to physical constraints, the abundance vector a{n) = [ai(n), . . . , a R (n)] T satisfies the 
following positivity and sum-to-one constraints 

R 

^2a r (n) = l, a r (n) > 0, Vr £ {1, . . . , R} . (4) 

r=l 
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Since the nonlinearity if} is fixed, the problem of unsupervised spectral unmixing is to deter- 
mine the L x D spectrum matrix Wo, the N x R abundance matrix A = [a(l), . . . , a(N)] T 
satisfying ([2]) under the constraints and the noise variance a 2 . Unfortunately, it can be 
shown that the solution of this constrained problem is not unique. In the noise-free linear 
case, it is well known that the data are contained in a simplex whose vertices are the 
endmembers. When estimating the endmembers in the linear case, a simplex of minimum 
volume embedding the data is expected. Equivalently, the estimated abundance vectors are 
expected to occupy the largest volume in the simplex defined by Q. In a similar fashion 
to the linear case, the estimated abundance matrix resulting from an unsupervised nonlinear 
SU strategy may not occupy the largest volume in the simplex defined by Q. To tackle this 
problem, we first propose to relax the positivity constraints for the elements of the matrix 
A and to consider only the sum-to-one constraint. For ease of understanding, we introduce 
R x 1 vectors satisfying the sum-to-one constraint 

R 

^ar P (n) = l, n = l,...,N (5) 

r=l 

referred to as latent variables and denoted as x(n) = [xi(n), . . . , xn(n)] T , n = 1, . . . , N. 
The positivity constraint will be handled subsequently by a scaling procedure discussed in 



Section IV The next section presents the Bayesian model for latent variable estimation using 
GPLVMs. 



III. Bayesian model 

GPLVMs p"5| are powerful tools for probabilistic nonlinear dimensionality reduction that 
rewrite the nonlinear model ([T]) as a nonlinear mapping from a latent space to the observation 
space as follows 

y (n) = Wij>[x(n)]+e(n), n = l,...,N (6) 

where if} is defined in (|3]), W = [wx, . . . , Wl} t is an LxD matrix with w e = [w^x, . . . , w^d] t , 
and D = R(R+l)/2. Note that from and ([6]) the columns of W span the same subspace as 
the columns of Wo- Consequently, the columns of W are linear combinations of the spectra 
of interest, i.e., the columns of Wo- Note also that when W is full rank, it can be shown that 
the latent variables are necessarily linear combinations of the abundance vectors of interest. 
Figs. [T] and [2] illustrate the mapping from the abundance vectors to the observations that will 
be used in this paper. Note that the linear mapping between the abundances and the latent 
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variables will be explained in detail in Section IV For brevity, the D x 1 vectors ?/> [cc(n)] 
will be denoted as ip x (n) in the sequel. Assuming independence between the observations, 
the statistical properties of the noise lead to the following likelihood of the N x L observation 
matrix Y = [y(l), . . . , y(iV)] T 

N 

Y\W, X, a 2 ~ Y[Af(y{n)\W^ x (n), a 2 \ L ) (7) 

n=l 

where X = [cc(l), . . . , x(N)] T is the N x R latent variable matrix. Note that the likelihood 
can be rewritten as a product of Gaussian distributions over the spectral bands as follows 

L 

Y\W, X, a 2 ~ Y[Af{ye\V x w e , cr 2 I L ) (8) 

i=\ 

where Y = [y 1; . . . , y L ] and & x = [ift x (l), ■ ■ ■ , t^ x (N)] T is an iV x D matrix. The idea of 
GPLVMs is to consider If as a nuisance parameter, to assign a Gaussian prior to W and 
to marginalize the joint likelihood ([7]) over W, i.e., 



f(Y\X,a 2 ) = j f{Y\W,X,a 2 )f{W)dW 



(9) 



where f(W) is the prior distribution of W. The estimation of X and a 2 can then be 
achieved by maximizing (|9]) following the maximum likelihood estimator (MLE) principle. 
An alternative consists of using an appropriate prior distribution f(X,a 2 ), assuming prior 
independence between W and (X,a 2 ), and maximizing the joint posterior distribution 

f{X,a 2 \Y) ex J f(Y\W,X 1 a 2 )f(W)f(X,a 2 )dW 
oc f(X,a 2 ) [ f(Y\W,X,a 2 )f(W)dW 



ex f(Y\X,a 2 )f(X,a 2 ) (10) 

with respect to (w.r.t.) (X,a 2 ), yielding the maximum a posteriori (MAP) estimator of 
(X,cr 2 ). The next paragraph discusses different possibilities for marginalizing the joint 
likelihood © w.r.t. W. 

A. Marginalizing W 

It can be seen from (|9]) that the marginalized likelihood and thus the associated latent 
variables depend on the choice of the prior f(W). More precisely, assigning a given prior for 
W favors particular representations of the data, i.e., particular solutions for the latent variable 



matrix X maximizing the posterior ( 10). When using GPLVMs for dimensionality reduction, 



July 24, 2012 



DRAFT 



an 



Linear 
mapping' 



x n 



Nonlinear 
mapping 



Linear 
mapping" 



9 



9 



e(n) 



Fig. 1. Nonlinear mapping from the abundances vectors to the observed mixed pixels. 
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Fig. 2. Example of mapping decomposition from the abundance vectors to the observed nonlinearly mixed pixels through 
the latent variables {R = 3). 



a classical choice |15| consists of assigning independent Gaussian priors for Wi, 
leading to 

DL T 

^— r i .. 



f(W) 



1 

2^ 



H exp 



l=\ 



\ w l\ 



,w L , 



(11) 



However, this choice can be inappropriate for SU. First, Eq. ( |TT| ) can be incompatible with the 
admissible latent space, constrained by Q. Second, the prior ( fTTj ) assumes the columns of W 
(linear combinations of the spectra of interest) are a priori Gaussian, which is not relevant 
for real spectra in most applications. A more sophisticated choice consists of considering 
a priori correlation between the columns (inter- spectra correlation) and rows (inter-bands 
correlation) of W using a structured covariance matrix to be fixed or estimated. In particular, 
introducing correlation between close spectral bands is of particular interest in hyperspectral 
imagery. Structured covariance matrices have already been considered in the GP literature for 
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vector- valued kernels |16j (see JT7J for a recent review). However, computing the resulting 
marginalized likelihood usually requires the estimation of the structured covariance matrix and 
the inversion of an NL x NL covariance matrix^ which is prohibitive for SU of hyperspectral 
images since several hundreds of spectral bands are usually considered when analyzing real 
data. Sparse approximation techniques might be used to reduce this computational complexity 
(see JT9| for a recent review). However, to our knowledge, these techniques rely on the 
inversion of matrices bigger than N x N matrices. The next section presents an alternative 
that only requires the inversion of an D x D covariance matrix without any approximation. 



B. Subspace identification 

It can be seen from ([6]) that in the noise-free case, the data belong to a D-dimensional 
subspace that is spanned by the columns of W. To reduce the computational complexity 
induced by the marginalization of the matrix W while considering correlations between 
spectral bands, we propose to marginalize a basis of the subspace spanned by W instead of 
W itself. More precisely, W can be decomposed as follows 

W = PU T (12) 

where P = [p 1 , . . . ,p L ] T is an L x D matrix (p e is D x 1 vector) whose columns are 
arbitrary basis vectors of the D-dimensional subspace that contains the subspace spanned by 
the columns of W and U = [ui, . . . , u D ] T is a D x D matrix that scales the columns of P. 
Note that the subspaces spanned by P and W are the same when W is full rank, resulting 
in a full rank matrix U. The joint likelihood ([8]) can be rewritten as 

L 

Y\P, U,X,a 2 ~ Y[AT {y e \C Pe , a 2 I L ) (13) 

e=i 

where C = ^ X U is an iV x D matrix. The proposed subspace estimation procedure consists 
of assigning an appropriate prior distribution to P (denoted as f(P)) and to marginalize P 
from the joint posterior of interest. It is easier to choose an informative prior distribution 
f(P) that accounts for correlation between spectral bands than choosing an informative 
f(W) since P is an arbitrary basis of the subspace spanned by W, which can be easily 
estimated (as will be shown in the next section). 



'See technical report 



18 



for further details. 
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C. Parameter priors 

GPLVMs construct a smooth mapping from the latent space to the observation space that 



preserves dissimilarities [20|. In the SU context, it means that pixels that are spectrally 
different have different latent variables and thus different abundance vectors. However, pre- 
serving local distances is also interesting: spectrally close pixels are expected to have similar 
abundance vectors and thus similar latent variables. Several approaches have been proposed 
to preserve similarities, including back-constraints [201, dynamical models pT[ and locally 



linear embedding (LLE) [22|. In this paper, we use LLE to assign an appropriate prior to 
X. First, the K nearest neighbors {y(j)}jeu z of each observation vector y(i) are computed 
using the Euclidian distance (Vj is the set of integers j such that y(j) is a neighbor of y(i)). 
The weight matrix Alle = of size N x N providing the best reconstruction of y(i) 
from its neighbors is then estimated as 



A 



LLE 



N 

arg min 

i=l 



(14) 



Note that the solution of ( |14| ) is easy to obtain in closed form since the criterion to optimize 
is a quadratic function of A. Note also that the matrix A is sparse since each pixel is only 
described by its K nearest neighbors. The locally linear patches obtained by the LLE can 
then be used to set the following prior for the latent variable matrix 



/(X|Allk,7) oc exp 



JV 

E 

»=i 



N 



X 



n ^ [*(»)] 



(15) 



n=l 



where 7 is a fixed hyperparameter to be adjusted and 1©(-) is the indicator function over the 
set V defined by the constraints ([5]). 

In this paper, we propose to assign a prior to P using the standard principal component 
analysis (PC A) (note again that the data have been centered). Assuming prior independence 
between p 1 ,...,p L , the following prior is considered for the matrix P 



2ns 2 



NL T 

-r L 



n 



exp 



1 

'2s 



2 nPi 



PtW 



(16) 



where P = [p 1( . . . ,p L ] T is an L x D projection matrix containing the first D eigenvectors 
of the sample covariance matrix of the observations (provided by PCA) and s 2 is a dispersion 
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parameter that controls the dispersion of the prior. Note that the correlation between spectral 
bands is implicitly introduced through P. 

Non-informative priors are assigned to the noise variance a 2 and the matrix U, i.e, 

f(a 2 ) oc 1 (0 ,5 ct2 )((t 2 ) 

where the intervals (0, 8 a i) and (— 8u, 8u) cover the possible values of the parameters a 2 
and U. Similarly, the following non-informative prior is assigned to the hyperparameter s 2 



(17) 



f(s 2 ) OC l (0 ,5 a2 )(s 2 ) 



(18) 



where the interval (0, 5 S 2) covers the possible values of the hyperparameter s 2 . The resulting 
directed acyclic graph (DAG) is depicted in Fig. [3j 







w 


p 


u 







Y 




Fig. 3. DAG for the parameter priors and hyperpriors (the fixed parameters appear in dashed boxes). 



D. Marginalized posterior distribution 

Assuming prior independence between P, X, U, s 2 and a 2 , the marginalized posterior 
distribution of = (X, U, s 2 , a 2 ) can be expressed as 

/(0|Y,A LLE ,P, 7 ) 

oc /(0|A LLE)7 ) J f(Y\P,6)f(P\P,s 2 )dP 

ex /(Y|0,P)/(0|A LLE)7 ) (19) 
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where /(0|Alle, 7) = f(X\A hLE , ^) f (U) f (s 2 ) f (a 2 ) . Straightforward computations leads 
to 

f(Y\6,P) = J f(Y\P,6)f(P\P,s 2 )dP 



oc 



■oc 



L 1 

n4 



cxp 



isi-f 



cxp 



--trfE^YY 7 ) 

2 v ; 



(20) 



Y-CP 



where £ = s 2 CC T + a 2 \ N , y e = y e — Cp e is an A^x 1 vector, Y = [y 1; 
is an iV x L matrix and tr(-) denotes the matrix trace. 

Mainly due to the nonlinearity introduced through the nonlinear mapping, a closed form 



expression for the parameters maximizing the joint posterior distribution ( [19] ) is impossible 
to obtain. We propose to use a scaled conjugate gradient (SCG) method to maximize the 
marginalized log-posterior. To ensure the sum-to-one constraint for X, the following arbitrary 
reparametrization 

R-l 

x R {n) = l-} j x r {ri), n = l,...,N 



r=l 



is used and the marginalized posterior distribution is optimized w.r.t. the first (R— 1) columns 
of X denoted X\ R . The partial derivatives of the log-posterior w.r.t. X\ R , U, s 2 and a 2 are 
obtained using partial derivatives w.r.t. S and Y and the classical chain rules (see technical 
report p"8| for further details). The resulting latent variable estimation procedure is referred 
to as locally linear GPLVM (LL-GPLVM). 

Note that the marginalized likelihood reduces to the product of L independent Gaussian 
probability density functions since 

y e \p e , U, X, a 2 , s 2 ~N (Cp e , s 2 CC T + a 2 l N ) (21) 

and £ = 1, . . . , L. Note also that the covariance matrix S = s 2 CC T + ct 2 Itv is related to 
the covariance matrix of the 2nd order polynomial kernel [23, p. 89]. More precisely, the 
proposed nonlinear mapping corresponds to a particular polynomial kernel whose metric is 



induced by the matrix U. Finally, note that the evaluation of the marginalized likelihood p0| ) 
only requires the inversion of the N x N covariance matrix S. It can been seen from the 



following Woodbury matrix identity [24| 



a 



-2 



i N -c(a 2 s- 2 i D + c T cy 1 c T 



(22) 
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that the computation of mainly relies on the inversion of a D x D matrix. Similarly, the 
computation of |S| = 1/|S _1 | mainly consists of computing the determinant of a D x D 
matrix, which reduces the computational cost when compared to the structured covariance 



matrix based approach presented in Section III-A 



E. Estimation of P 

Let us denote as 6 = (X,U,s 2 ,a 2 ) the maximum a posteriori (MAP) estimator of 6 = 
(X, U,s 2 ,a 2 ) obtained by maximizing ( fT9| ). Using the likelihood ( fj"3| ), the prior distribution 
( [To} and Bayes' rule, we obtain the posterior distribution of P conditioned upon 6, i.e., 

L 

P\Y,0,P~ l[M(p e \p t S) (23) 

l=\ 

where S^ 1 = a~ 2 C T C + s~ 2 Id and p e = S(C T ye — Pg). Since the conditional posterior 
distribution of P is the product of L independent Gaussian distributions, the MAP estimator 
of P conditioned upon is given by 

Y T C -p)s (24) 



where S 1 = a~ 2 C T C + s- 2 \ D , C = $ X C7, $ x = ty> 4 (l), . . . , *I>$(N)] T and X = 
[x(l), . . . , x(N)] T . The next section studies a scaling procedure that estimates the abundance 



matrix using the estimated latent variables resulting from the maximization of ( |T9| ). 

IV. Scaling procedure 



The optimization procedure presented in Section III-D provides a set of latent variables 
that represent the data but can differ from the abundance vectors of interest. Consider X = 
[X\ R , In — X\r1r-i] obtained after maximization of the posterior ( fT9] ). The purpose of this 
section is to estimate an N x R abundance matrix A = [o(l), . . . , a(N)] T such that 

X\ R = AVl^ + E (25) 

where a(l), . . . ,a(N) occupy the maximal volume in the simplex defined by @, Vr-i = 
[vi, . . . , Vr) is an (R — 1) x R matrix and E is an N x (R — 1) standard i.i.d Gaussian 
noise matrix which models the scaling errors. Since X satisfy the sum-to-one constraint ([5]), 
estimating the relation between X\r and A is equivalent to estimate the relation between 
X and A. However, when considering the mapping between X and A, non-isotropic noise 
has to be considered since the rows of X and A satisfy the sum-to-one constraint, i.e., they 
belong to the same (R — 1) -dimensional subspace. 
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Eq. ( |25] ) corresponds to a LMM whose noisy observations are the rows of X\ R . Since A 
is assumed to occupy the largest volume in the simplex defined by the columns of V R -\ 
are the vertices of the simplex of minimum volume that contains X\ R . As a consequence, it 
seems reasonable to use a linear unmixing strategy for the set of vectors X\ R (1), . . . , x\ R (N) 
to estimate A and V R -\. In this paper, we propose to estimate jointly A and V R -\ using the 
Bayesian algorithm presented in |25] for unsupervised SU assuming the LMM. Note that the 



algorithm in [25 1 assumed positivity constraints for the estimated endmembers. Since these 



constraints for V R -\ are unjustified, the original algorithm has slightly been modified by 
removing the truncations in the projected endmember priors (see p5| for details). Once the 

estimator (A, V R ^i) of (A, V r-i) has been obtained by the proposed scaling procedure, the 

- — -( c ) 

resulting constrained latent variables denoted as X = [x^(l),. . . ,x^ cS) (N)] T are defined 
as follows 



X 



(c) 



X 



(c) 



-—(c) 

Ijv — X\ R l R ^i 



(26) 



with X\ R = AV R _ 1 . Using the sum-to-one constraint A1 R = ljy, we obtain 



X 



A 



(27) 



AV 



R 



- T 



where V r = [V R _ 1 , 1 R — V R _ 1 1 R _ 1 ] T is an R x R matrix. The final abundance estimation 



procedure, including the LL-GPLVM presented in Section III and the scaling procedure 
investigated in this section is referred to as fully constrained LL-GPVLM (FCLL-GPLVM) 
(a detailed algorithm is available in UM). Once the final abundance matrix A and the matrix 
V R have been estimated, we propose an endmember extraction procedure based on GP 
regression. This method is discussed in the next section. 



V. Gaussian process regression 

Endmember estimation is one of the main issues in SU. Most of the existing EEAs intend to 
estimate the endmembers from the data, i.e., selecting the most pure pixels in the observed 



image [26 1— [28]. However, these approaches can be inefficient when the image does not 
contain enough pure pixels. Some other EEAs based on the minimization of the volume 
containing the data (such as the minimum volume simplex analysis p9| ) can mitigate the 
absence of pure pixels in the image. This section studies a new endmember estimation strategy 
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based on GP regression for nonlinear mixtures. This strategy can be used even when the scene 
does not contain pure pixels. It assumes that all the image abundances have been estimated 
using the algorithm described in Section 



IV 



Consider the set of pixels {y(n)} 



n=l....,N 



and 



corresponding estimated abundance vectors {a(n)} n=l N . GP regression first allows the 
nonlinear mapping g(-) in ([T]) (from the abundance space to the observation space) to be 
estimated. The estimated mapping is denoted as g(-). Then, it is possible to use the prediction 
capacity of GPs to predict the spectrum g{cx) corresponding to any new abundance vector 
ol. In particular, the predicted spectra associated with pure pixels, i.e., the endmembers, 
correspond to abundance vectors that are the vertices of the simplex defined by @. This 
section provides more details about GP prediction for endmember estimation. 

It can be seen from the marginalized likelihood ( [20] ) that f(Y\X,P,U,s 2 ,a 2 ) is the 



product of L independent GPs associated with each spectral band of the data space plj ). 
Looking carefully at the covariance matrix of yi (i.e., to £ = s 2 CC T + a 2 \^), we can write 

y t = z £ + e e (28) 

where is the JV x 1 white Gaussian noise vector associated with the £th spectral band 
(having covariance matrix a 2 \ N ) ancQ 

z e ~Af(z e \* x Up e ,K) (29) 

with K = s 2 ty x UU T i&z the N x N covariance matrix of Zt. The N x 1 vector zi is referred 
to as hidden vector associated with the observation y e . Consider now an L x 1 test data with 
hidden vector z* = [z{, ...,zl\ T , abundance vector a* = [a*, ...,a* R ] T and if)* x = if) [VrCx*]. 
We assume that the test data share the same statistical properties as the training data yi, 
in the sense that [zj, z%] is a Gaussian vector such that 



K 



K,(Ot 



(30) 



where a 2 , = s 2 ^*^ r UU T tf) x is the variance of z\ and K,(a*) contains the covariances 
between the training inputs and the test inputs, i.e., 

k(c?) = s 2 il>* x T UU T V x . (31) 

Straightforward computations leads to 

zg\y e ~ Af(z}\ni,sf) (32) 



Note that all known conditional parameters have been omitted for brevity. 
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with 

fit = i/>?Up t + k(cx*) t (K + a 2 l N )- 1 (y i -V x Up e ) 
s 2 = a^-K^fiK + a 2 !^- 1 ^*). 

Since the posterior distribution ( [32] ) is Gaussian, the MAP and MMSE estimators of z* equal 
the posterior mean fx = (fix, ...,/i L ) T . 

In order to estimate the endmembers, we propose to replace the parameters X,U,s 2 and a 2 

( c ) ^ 

by their estimates X ,U,s 2 and a 2 and to compute the estimated hidden vectors associated 
with the abundance vectors ck* = [0^_ x , l,0^_ r ] T for r = 1, ...,R. For each value of r, the 
rth estimated hidden vector will be the rth estimated endmembeiQ Indeed, for the LMM 
and the bilinear models considered in this paper, the endmembers are obtained by setting 
a* = [0^_ 1; 1, 0^_ r ] T in the model ([2]) relating the observations to the abundances. Note that 
the proposed endmember estimation procedure provides the posterior distribution of each 



endmember via (32) which can be used to derive confidence intervals for the estimates. The 



next section presents some simulation results obtained for synthetic and real data. 

VI. Simulations 

A. Synthetic data 

The performance of the proposed GPLVM for dimensionality reduction is first evaluated 
on three synthetic images of N = 2500 pixels. The R = 3 endmembers contained in these 



images have been extracted from the spectral libraries provided with the ENVI software [ 30 1 
(i.e., green grass, olive green paint and galvanized steel metal). The first image I\ has been 
generated according to the linear mixing model (LMM). The second image I2 is distributed 
according to the bilinear mixing model introduced in [5], referred to as the "Fan model" 
(FM). The third image I3 has been generated according to the generalized bilinear model 
(GBM) studied in [[6) with the following nonlinearity parameters 

7i >2 = 0.9, 71,3 = 0.5, 72,3 = 0.3. 

The abundance vectors a n ,n = 1,...,N have been randomly generated according to a 
uniform distribution on the admissible set defined by the positivity and sum-to-one constraints 
([4]). The noise variance has been fixed to a 2 = 10~ 4 , which corresponds to a signal-to-noise 

3 Note that the estimated endmembers are centered since the data have previously been centered. The actual endmembers 
can be obtained by adding the empirical mean to the estimated endmembers. 
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ratio SNR ps 30dB which corresponds to the worst case for current spectrometers. The 
hyperparameter 7 of the latent variable prior ([T5|) has been fixed to 7 = 10 3 and the number 



of neighbors for the LLE is K = R for all the results presented in this paper. The quality 
of dimensionality reduction of the GPLVM can be measured by the average reconstruction 
error (ARE) defined as 



ARE 



\ 



1 N 

— Y 



(33) 



n=l 



where y n is the nth observed pixel and y n its estimate. For the LL-GPLVM, the nth estimated 



pixel is given by y n = PU 1/? [ic(n)] where P is estimated using ( |24] ). Table |l| compares the 
AREs obtained by the proposed LL-GPLVM and the projection onto the first (R—l) principal 
vectors provided by the principal component analysis (PCA). The proposed LL-GPLVM 
slightly outperforms PCA for nonlinear mixtures in term of ARE. More precisely, the AREs 
of the LL-GPLVM mainly consist of the noise errors (a 2 = 10~ 4 ), whereas model errors 
are added when applying PCA to nonlinear mixtures. Fig. [4] compares the latent variables 
obtained after maximization of pO) ) for the three images Ii to J 3 with the projections obtained 
by projecting the data onto the R—l principal vectors provided by PCA. Note that only R—l 
dimensions are needed to represent the latent variables (because of the sum-to-one constraint). 
From this figure, it can be seen that the latent variables of the LL-GPLVM describe a noisy 
simplex for the three images. It is not the case when using PCA for the nonlinear images. 
Fig. [5] shows the manifolds estimated by the LL-GPLVM for the three images 1\ to I3. This 
figure shows that the proposed LL-GPLVM can model the manifolds associated with the 
image pixels with good accuracy. The quality of unmixing procedures can also be measured 

table 1 

ARES: SYNTHETIC IMAGES. 





ARE (xlCT 2 ) 


h 


h 


h 


n 


IS 


T* 


PCA 


0.99 


1.08 


1.04 


1.00 


1.06 


1.03 


LL-GPLVM 


0.99 


0.99 


1.00 


1.00 


1.00 


0.99 


SU 


1.00 


1.13 


1.06 


1.14 


1.57 


1.12 


FCLL-GPLVM 


0.99 


0.99 


1.00 


1.00 


1.00 


0.99 



by comparing the estimated and actual abundances using the root normalized mean square 
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x 1 



Fig. 4. Top: Representation of the N = 2500 pixels (dots) using the first two principal components provided by the 
standard PC A for the three synthetic images 7i to I3. Bottom: Representation using the latent variables estimated by the 
LL-GPLVM for the three synthetic images 7i to 73. 



error (RNMSE) defined by 

N 



RNMSE 



\ 



— ^||a n - a n || 2 (34) 



NR 



where a n is the nth actual abundance vector and a n its estimate. Table [n] compares the 
RNMSEs obtained with different unmixing strategies. The endmembers have been estimated 
by the VCA algorithm in all simulations. The algorithms used for abundance estimation are 
the FCLS algorithm proposed in pTJ for J l5 the LS method proposed in (5J for I 2 and the 
gradient-based method proposed in Q for I 3 . These procedures are referred to as "SU" in 
the table. These strategies are compared with the proposed FCLL-GPLVM. As mentioned 
above, the Bayesian algorithm for joint estimation of A and V under positivity and sum-to- 



one constraints for A (introduced in [25 1) is used in this paper for the scaling step. It can 
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(a) h (LMM) 




(b) I 2 (FM) (c) h (GBM) 



Fig. 5. Visualization of the N — 2500 pixels (black dots) of I\, I2 and ^3 using the 3 axis provided by the PCA procedure. 
The colored surface is the manifold identified by the LL-GPLVM. 

be seen that the proposed FCLL-GPLVM is general enough to accurately approximate the 
considered mixing models since it provides the best results in term of abundance estimation. 



TABLE 11 

RNMSES: SYNTHETIC IMAGES. 





RNMSE (xlO -3 ) 


h 


h 


h 


n 




n 


su 


5.7 


1A 


22.7 


49.3 


86.6 


47.8 


FCLL-GPLVM 


3.9 


4.2 


5.4 


4.8 


7.2 


7.5 



The quality of reconstruction of the unmixing procedure is also evaluated by the ARE. For 
the FCLL-GPLVM, the nth reconstructed pixel y n is given by y n = PU i/> [£( c )(n)] . Table 
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[I] also shows the AREs corresponding to the different unmixing strategies. The proposed 
FCLL-GPLVM outperforms the other strategies in term of ARE for these images. 

Finally, the performance of the FCLL-GPLVM for endmember estimation is evaluated by 
comparing the estimated endmembers with the actual spectra. The quality of endmember 
estimation is evaluated by the spectral angle mapper (SAM) defined as 

(m r ,m r ) 



SAM = arccos 



(35) 



where m r is the rth actual endmember and m r its estimate. Table III compares the SAMs 
obtained for each endmember using the VCA algorithm, the nonlinear EEA presented in 
p4| (referred to as "Heylen") and the FCLL-GPLVM for the three images h to I 3 . These 
results show that the FCLL-GPLVM provides accurate endmember estimates for both linear 
and nonlinear mixtures. 



TABLE III 

SAMS (Xl0~ 2 ): SYNTHETIC IMAGES. 





VCA 


Heylen 


FCLL-GPLVM 




mi 


0.43 


1.94 


0.52 


h 


m 2 


0.22 


0.66 


0.86 




m 3 


0.22 


0.78 


0.15 




mi 


1.62 


0.75 


0.33 


h 


m 2 


2.08 


1.69 


0.53 




m 3 


1.15 


0.42 


0.34 




mi 


1.91 


1.80 


0.44 


h 


m 2 


1.36 


0.86 


0.58 




m 3 


0.88 


1.38 


0.30 



Finally, the performance of the proposed unmixing algorithm is tested in scenarios where 
pure pixels are not present in the observed scene. More precisely, the simulation parameters 
remain the same for the three images Ii to I 3 except for the N = 2500 abundance vectors, 
that are drawn from a uniform distribution in the following set 

|a|^a r = l, 0.9 > a r {n) > 0, Vr G {1, . . . , R} \ . (36) 

The three resulting images are denoted as Jj\ 1% and J|. Table [I] shows that the absence 
of pure pixels does not change the AREs significantly when they are compared with those 
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obtained with the images I x to J 3 . Moreover, FCLL-GPLVM is more robust to the absence of 
pure pixels than the different SU methods. The good performance of FCLL-GPVLM is due 
in part to the scaling procedure. Table [n] shows that the performance of the FCLL-GPLVM 
in term of RNMSE is not degraded significantly when there is no pure pixel in the image 
(note that the situation is different when the endmembers are estimated using VCA). Table 



IV shows the performance of the FCLL-GPLVM for endmember estimation when there are 



no pure pixels in the image. The results of the FCLL-GPLVM do not change significantly 
when they are compared with those obtained with images I\ to ^3, which is not the case 
for the two other EEAs. The accuracy of the endmember estimation is illustrated in Fig. [6] 
which compares the endmembers estimated by the FCLL-GPLVM (blue lines) to the actual 
endmember (red dots) and the VCA estimates (black line) for the image 1%. 




Fig. 6. Actual endmembers (red dots) and endmembers estimated by the FCLL-GPLVM (blue lines) and VCA (black line) 
for the image /J. 

The next section presents simulation results obtained for real data. 
B. Real data 

The real image considered in this section was acquired in 2010 by the Hyspex hyperspectral 
scanner over Villelongue, France. L = 160 spectral bands were recorded from the visible 
to near infrared with a spatial resolution of 0.5m. This dataset has already been studied in 
p2| and is mainly composed of different vegetation species. The sub-image of size 50 x 50 
pixels chosen here to evaluate the proposed unmixing procedure is depicted in Fig. [7] This 
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TABLE IV 

SAMS (xlCT 2 ): SYNTHETIC IMAGES. 





VCA 


Heylen 


FCLL-GPLVM 




mi 


2.87 


6.38 


0.38 


It 


m> 


2.15 


11.11 


1.30 




m 3 


2.10 


2.62 


0.24 




mi 


5.22 


7.53 


0.67 




m 2 


8.02 


9.59 


1.46 




m 3 


7.10 


2.48 


0.53 




mi 


6.89 


6.59 


0.61 


IS 


m 2 


6.03 


5.95 


1.75 




m 3 


3.73 


2.36 


0.48 




Fig. 7. Real hyperspectral data: Madonna data acquired by the Hyspex hyperspectral scanner over Villelongue, France 
(left) and the region of interest shown in true colors (right). 



image is mainly composed of three components since the data belong to a two-dimensional 
manifold (see black dots of Fig. [8] (a)). Consequently, we assume that the scene is composed 



of R = 3 endmembers. Using the ground truth used in [32|, we can determine some tree 
species present in the scene of interest. More precisely, Fig. [8] (a) shows the ground truth 
clusters corresponding to oak trees (red dots) and chestnut trees (blue dots) projected in a 
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3 -dimensional subspace (defined by the first three principal components of a PCA applied to 
the image of Fig. [7]). These two clusters are close to vertices of the data cloud. Consequently, 
oak and chestnut trees are identified as endmembers present in the image. Moreover, the third 



endmember is not a tree species according to the ground truth provided in [32|. In the sequel, 
this endmember will be referred to as Endmember jj3. 

The simulation parameters have been fixed to 7 = 10 3 and K = R. The latent variables 




-0.2 0.2 0.4 0.6 0.8 




Fig. 8. (a): Representation of the TV = 2500 pixels (black dots) of the Madonna image and the ground truth clusters 
corresponding to oak trees (red dots) and chestnut trees (blue dots) using the first three principal components provided by 
the standard PCA. (b): Representation of the TV = 2500 pixels (dots) of the Madonna data and manifold identified by the 
LL-GPLVM (colored surface). (c):Representation of the TV = 2500 pixels (dots) of the Madonna data and boundaries of 
the estimated transformed simplex (blue lines). 



obtained by maximizing the marginalized posterior distribution ( fT0| ) are depicted in Fig. [9] 
(blue dots). It can be seen from this figure that the latent variables seem to describe a noisy 
simplex. Fig. [8] (b) shows the manifold estimated by the proposed LL-GPLVM. This figure 
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0.32 




18 1 1 1 1 1 

0.1 0.15 0.2 0.25 0.3 0.35 

X 1 

Fig. 9. Representation of the N = 2500 latent variables (dots) estimated by the LL-GPLVM and the simplex identified 
by the scaling step (red lines) for the Madonna data. 



illustrates the capacity of the LL-GPLVM for modeling the nonlinear manifold. Table |Vj (left) 
compares the AREs obtained by the proposed LL-GPLVM and the projection onto the first 
i? — 1 = 2 principal vectors provided by PC A. The proposed LL-GPLVM slightly outperforms 
PCA for the real data of interest, which shows that the proposed nonlinear dimensionality 
reduction method is more accurate than PCA (linear dimensionality reduction) in representing 



the data. The scaling step presented in Section IV is then applied to the estimated latent 
variables. The estimated simplex defined by the latent variables is depicted in Fig. [9] (red 
lines). Fig. [8] (c) compares the boundaries of the estimated transformed simplex with the 



image pixels. The abundance maps obtained after the scaling step are shown in Fig. 10 (top). 
The results of the unmixing procedure using the FCLL-GPLVM are compared to an unmixing 
strategy assuming the LMM. More precisely, we use VCA to extract the endmembers from 
the data and use the FLCS algorithm for abundance estimation. The estimated abundance 
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maps are depicted in Fig. [TO] (bottom). The abundance maps obtained by the two methods 
are similar which shows the accuracy of the proposed unmixing strategy when considering 
the LMM as a first order approximation of the mixing model. 

TABLE V 
ARES: REAL IMAGE (xlO -2 ). 



PCA 


LL-GPLVM 


VCA+FCLS 


FCLL-GPLVM 


0.84 


0.79 


1.30 


1.11 



Moreover, Fig. [TT] shows the classification map obtained in [32| for the region of interest. 
The unclassified pixels correspond to areas where the classification method of |32j has not 
been performed. Even if lots of pixels are not classified, the classified pixels can be compared 
with the estimated abundance maps. First, we can note the presence of the same tree species in 
the classification and abundance maps, i.e., oak and chestnut. We can also see that the pixels 
composed of chestnut trees and Endmember JJ3 are mainly located in the unclassified regions, 
which explains why they do not appear clearly in the classification map. Only one pixel is 
classified as being composed of ash trees in the region of interest. If unclassified pixels also 
contain ash trees, they are either too few or too mixed to be considered as mixtures of an 



additional endmember in the image. Finally, it can be seen from Figs. 10 and 11 that oak 
trees are located within similar regions (left corners and top right corner) for the abundance 
and classification maps. 

Evaluating the performance of endmember estimation on real data is an interesting problem. 
However, comparison of the estimated endmembers with the ground truth is difficult here. 
First, since the nature of Endmember jj3 is unknown, no ground truth is available for this 
endmember. Second, because of the variability of the ground truth spectra associated with 
each tree species, it is difficult to show whether VCA or the proposed FCLL-GPLVM provides 
the best endmember estimates. However, the AREs obtained for both methods (Table |V] right) 
show that the FCLL-GPLVM fits the data better than the linear SU strategy, which confirms 
the importance of the proposed algorithm for nonlinear spectral unmixing. 

VII. Conclusions 

We proposed a new algorithm for nonlinear spectral unmixing based on a Gaussian process 
latent variable model. The unmixing procedure assumed a nonlinear mapping from the 
abundance space to the observed pixels. It also considered the physical constraints for the 
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Chestnut tree Oak tree Endmember#3 




Fig. 10. Top: Abundance maps estimated using the FCLL-GPLVM for the Madonna image. Bottom: Abundance maps 
estimated using the VCA algorithm for endmember extraction and the FCLS algorithm for abundance estimation. 

abundance vectors. The abundance estimation was decomposed into two steps. Dimensionality 
reduction was first achieved using latent variables. A scaling procedure was then proposed 
to estimate the abundances. After estimating the abundance vectors of the image, a new 
endmember estimator based on Gaussian process regression was investigated. Simulations 
conducted on synthetic and real images illustrated the flexibility of the proposed model for 
linear and nonlinear spectral unmixing and provided promising results for abundance and 
endmember estimations in spite of the absence of pure pixels in the image. The choice of the 
nonlinear mapping used for the GP model is an important issue to ensure that the LL-GPLVM 
is general enough to handle different nonlinearities. In particular, different mappings could 
be used for intimate mixtures. Moreover, including the GPLVM in a Bayesian framework 
could also be an interesting prospect to consider spatial correlation through Markov random 
fields or GPs. 
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Fig. 11. Classification map obtained in |32| for the region of interest of the Madonna image. 
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